ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / tsql / doc / tsql.mail / 000078_csj@iesd.auc.dk _Thu Apr 8 21:29:00 1993.msg < prev next >

Wrap

Internet Message Format | 1996-01-31 | 20KB

Received: from iesd.auc.dk by optima.cs.arizona.edu (5.65c/15) via SMTP id AA15157; Thu, 8 Apr 1993 12:29:43 MST Received: from yellow.iesd.auc.dk by iesd.auc.dk with SMTP id AA21807 (5.65c8/IDA-1.5/MD for <tsql@cs.arizona.edu>); Thu, 8 Apr 1993 21:29:00 +0200 Date: Thu, 8 Apr 1993 21:29:00 +0200 From: "Christian S. Jensen" <csj@iesd.auc.dk> Message-Id: <199304081929.AA21807@iesd.auc.dk> To: tsql@cs.arizona.edu Subject: TSQL Benchmark, Task 3. ******************************************************************** * The TSQL Benchmark Initiative -- Task 3: Taxonomy * ******************************************************************** Three initial tasks were defined in connection with version 1 of the TSQL benchmark. Task 1: Decide on a db schema Task 2: Decide on an instance for the schema Task 3: Decide on a classification of benchmark queries When these are completed, we still need to enter queries into the benchmark (Task 4). On the other hand, the deadline is firm--the benchmark must be finished in time for the TDB Workshop in June. Task 4: Populate the benchmark with queries of each type identified in the taxonomy. Task 1 is essentially completed, and we now have a consensus schema that is well-suited for the benchmark. While discussions are currently being carried out wrt Task 2, we can also start discussing Task 3. For that purpose, the straw proposal for a taxonomy of benchmark queries has been completed. This proposal is appended below. As for the other tasks, comments, improvements, suggestions, etc. are very welcome. Best regards, Christian S. Jensen Aalborg University csj@iesd.auc.dk \documentstyle[11pt]{article} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This document contains what is intended to evolve into the % consensus taxonomy of benchmark queries for the first version of the % TSQL Benchmark. It addresses Task 3 of the initial tasks and should % be appended, as an individual section to the document ``The TSQL % Benchmark.'' %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \addtolength{\textwidth}{1.485in}%{1.2in} \setlength{\oddsidemargin}{.1in}%{.3in} \setlength{\evensidemargin}{.1in}%{.3in} \addtolength{\topmargin}{-.85in} %{-1.35in} \addtolength{\textheight}{1.8in} %{2.8in} \long\def\comment#1{} \newenvironment{BNF}{\vspace{-\partopsep}\addtolength{\baselineskip}{+4pt} \samepage\begin{tabbing} %\quad \ \ \ \ \=\ \ \ \ \=\ \ \ \ \=\ \ \ \ \=\ \ \ \ \=\ \ \ \ \=\ \ \ \ \=\ \ \ \ \= \+\kill}{\end{tabbing}\vspace{-\partopsep}\vspace{-\topsep}\vspace{-\parsep}} % => in roman \def\arrow{\char'75\char'76\relax} % ::= in roman \def\is{{\rm \verb.:.\verb.:.\char'75\relax}} % { in roman \def\lbr{$\bigl\{$} % } in roman \def\rbr{$\bigr\}\;$} % }* in roman \def\starbr{$\bigl\}${\rm *}$\;$} % }? in roman \def\quesbr{$\bigr\}{}^?$} % }+ in roman \def\plusbr{$\bigr\}{}^+$} % | in roman \def\vbar{$\bigl|\;$} % 'chars \def\qt#1{`{#1}'} % nonterminals (arg is the name) \def\nt#1{$<${\rm #1}$>$} %typerwriter font \def\ttt#1{{\tt #1}} %comment \def\com#1{{\tt /$\ast$} {\footnotesize #1}} %double square brackets \def\dsl{{\tt [\hspace{-4.5pt}[ \hspace{-1pt}}} \def\dsr{{\tt ]\hspace{-4.5pt}]}} \def\dsrs{{\tt ]\hspace{-4.5pt}]\hspace{4pt}}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % PAPER START %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document} \title{\Large\bf The TSQL Benchmark \\ Taxonomy DRAFT} \author{} \date{April 8, 1993} \maketitle \section{Classification of Benchmark Queries} A classification of benchmark queries will be based on a comprehensive taxonomy of queries. First, critria for such a taxonomy are outlined. Next, the taxonomy itself is presented. As the taxonomy is too fine-grained, categories are then merged into an adequate number of groups which can subsequently be used for classification. \subsection{Criteria} Three criteria for an appropriate taxonomy of benchmark queries are suggested. \begin{itemize} \item{} The taxonomy should be schema and instance independent. This criterion helps ensure that the taxonomy will persist when the benchmark database schema evolves as new versions appear. Ideally, this will allow for an incremental mode of work, where only new queries need to be categorized and existing queries do not need re-categorization. \item{} The taxonomy should provide comprehensive coverage of benchmark queries. Comprehensiveness is desirable to avoid holes and point to many categories of queries. \item{} The taxonomy should be useful when structuring the presentation of benchmark queries. Most importantly, it should provide sufficient structure. Thus, taxonomies that have only few categories and that map many queries to single categories are problematic. If the number of categories is excessive for presentation purposes, classes of categories may be identified with individual sections. \end{itemize} \subsection{The Taxonomy} The taxonomy is characterized as having a projection (output) and a selection component, much like SQL. Then each component is covered in turn. Finally, the full taxonomy is summarized and a notation for naming individual categories is defined. \subsubsection{Top-level Taxonomy} At the top level, the taxonomy is divided into two orthogonal parts, namely a part where queries are categorized according to their {\em output component} and a part where the categorization is based on the {\em selection component}. Thus, a category is described by two components, as illustrated in Figure~\ref{fig:top}. \begin{figure}[htbp] \[ \{ <\mbox{output component}> \} \times \{ <\mbox{selection component}> \} \] \caption{Top-level Description of Benchmark Taxonomy} \label{fig:top} \end{figure} This top-level design reflects the SQL template (i.e., {\tt SELECT} \ldots {\tt FROM} \ldots {\tt WHERE} \ldots). The first component categorizes the contents of the {\tt SELECT} clause, and the second component categorizes the contents of the {\tt WHERE} clause. No component is needed to reflect the {\tt FROM} clause where tuple variables are defined. The two components are orthogonal only in the same sense that the {\tt WHERE} and {\tt SELECT} clauses of a particular query are orthogonal. \subsubsection{Output-based Taxonomy} The output-based taxonomy is intended to reflect the part of queries where the format of the resulting tuples is specified. The taxonomy is described in Figure~\ref{fig:pro1} and is explained in the following. \begin{figure}[htbp] \begin{center} \leavevmode \[ \left\{ \begin{array}{c} \mbox{\underline{explicit-attribute component}} \\ \mbox{none} \\ \mbox{projected} \\ \mbox{complete} \end{array} \right\} \times \left\{ \begin{array}{c} \mbox{\underline{valid-time component}} \\ \mbox{none} \\ \left\{ \begin{array}{c} \mbox{\underline{type}} \\ \mbox{event} \\ \mbox{interval} \\ \mbox{element} \end{array} \right\} \times \left\{ \begin{array}{c} \mbox{\underline{value}} \\ \mbox{derived} \\ \mbox{imposed} \end{array} \right\} \end{array} \right\} \] \end{center} \caption{Output-based Taxonomy} \label{fig:pro1} \end{figure} The idea is to distinguish between queries based on the format of the result tuples. A tuple may include an explicit-attribute component and a valid-time component, each of which are considered next. If present, the explicit-attribute component, may contain all attributes in the argument relation (multiple relations are discussed below) or it may contain a subset of the attributes in the argument relation. In the first case, the explicit attribute component is ``complete,'' and in the second, it is ``projected.'' To exemplify, consider a tuple telling that Ed is in the Book department from 1/1/82 to 12/31/84. Here ``Ed'' and ``Book'' constitute the explicit-attribute component, and ``1/1/82'' and ``12/31/84'' is the valid-time component. If the argument relation contained an attribute ``Salary'' in addition to the Name and Department attributes, this result is projected. If several relations are used in a query, the argument relation is the Cartesian product of these, i.e., the schema is the concatenation of the schemas of the relations used in the query. The valid-time component of a tuple may be of three types. First, it may be an event, i.e., a single time value (e.g., 3/1/83). Second, it may be an interval, i.e., a sequence of consecutive time values (e.g., as above). Third, it may be an element, i.e., a set of time values which may be described by a set of intervals (e.g., 1/1/82 to 12/31/84, 2/1/85 to 3/31/85, and 5/1/86 to 5/31/86). Orthogonally, the value of a valid-time component may be derived or imposed. A derived value is computed solely in terms of the valid-time components of the tuples in the argument relation. An imposed value is computed by explicit assignment in the query. Note that at least one of the two components must be present in the result of a query. This part of the taxonomy results in 20 mutually exclusive categories. The distinctions above are based on the schema of result relations. It is possible also to distinguish between the cardinalities of result relations, e.g., between set-valued and single-tuple valued results. \subsubsection{Selection-based Taxonomy} The selection component is divided into two parts, one for valid-time selection and one for selection not involving valid time. See Figure~\ref{fig:selt}. \begin{figure}[htbp] \[ \{ <\mbox{valid-time selection}> \}^\ast \times \{ <\mbox{non-temporal selection}> \}^\ast \] \caption{Top-level Selection-based Taxonomy} \label{fig:selt} \end{figure} Both parts are based on the same observation. In general, a selection predicate is built from atomic selection predicates using logical operators (e.g., {\tt and}, {\tt or}, and {\tt implies}) and parenthesis. Both parts categorize queries based on the atomic predicates used in the queries. As several types of atomic predicates may be used in the same query, queries generally fall into multiple categories (as indicated in Figure~\ref{fig:selt} by the Kleene star, ``${}^\ast$''). We examine each part of the selection-based taxonomy in turn. Atomic valid-time selection predicates are assumed to be of the form \[ arg_1 \mbox{\tt ~op~} arg_2 \hspace*{2mm},\] where {\tt op} is a some comparison operator (e.g., {\tt precedes}, and {\tt contains}). It is assumed that $arg_1$ is the valid time of the data, and restrictions are imposed based on the type of the comparison operator, on the origin of $arg_2$, and on the type of $arg_2$. Figure~\ref{fig:sel2} outlines the categories. \begin{figure}[htbp] \begin{center} \leavevmode \[ \left\{ \begin{array}{c} \mbox{\underline{type of comparison operator}} \\ \mbox{duration-based} \\ \mbox{ordering-based} \\ \mbox{containment-based} \end{array} \right\} \times \left\{ \begin{array}{c} \mbox{\underline{type of $arg_2$}} \\ \mbox{event} \\ \mbox{interval} \\ \mbox{element} \end{array} \right\} \times \left\{ \begin{array}{c} \mbox{\underline{origin of $arg_2$}} \\ \mbox{explicitly supplied in query} \\ \mbox{user-defined attribute value} \\ \mbox{computed from other valid times} \end{array} \right\} \] \end{center} \caption{Valid-time Selection-based Taxonomy} \label{fig:sel2} \end{figure} Three types of comparison operators are identified. First, a comparison operator may be duration-based. For example the operator {\tt spanExceeds} returns true if the duration of the first argument is equal to or larger than the duration of the second argument. Second, comparison operators may be based on ordering. Operators in this category include {\tt precedes} and {\tt meets}. The first applies to all timestamps and evalutes to true if the largest time in the first argument is smaller than the smallest times in the second argument. Operator {\tt meets} appears to be useful only for events and intervals. Two timestamps meet if they are not separated by any event (i.e., may be coalesced). Operators based on containment include {\tt =} (identity), {\tt overlaps}, and {\tt contains}. The second argument ($arg_2$) may be an event, an interval, or an element. Also, it may come from three sources. First, it may be supplied directly in the query, as a constant. Second, it may be the value of a user-defined time attribute in an argument tuple. Note that this is only possible for events if first normal form is required. Third, like the first argument, the second argument may be computed from valid times in the argument tuples. If the three types of categories are completely orthogonal, this part of the taxonomy will contribute with a total of 27 categories. However, it may be debated whether intervals and elements may be used as values of user-defined attributes (resulting in non-1NF relations). The final part of the selection-based taxonomy categorizes queries based solely on the part of the selection component that involves only ordinary, non-temporal selection. Many possibilities for categorization exist. Below, in Figure~\ref{fig:sel1}, we distinguish between four significant types of atomic selection predicates. First, an attribute may be compared with a constant, supplied by the user. Second, attribute values, both in the same relation, may be compared. Third, a primary key value may be compared with a matching foreign key value. Fourth, arbitrary attributes of possibly distinct relations may be compared. In the figure, $\theta ::= \; < | > | \leq | \geq | = \;$, i.e., a combination of equality and/or the one of the two inequality operators. If we distinguish between situations where only equality is involved and situations where inequality is involved, this give 8 categories. \begin{figure}[htbp] \begin{center} \leavevmode \[ \left\{ \begin{array}{c} \mbox{\underline{non-temporal attribute value selection}} \\ att~\theta~\mbox{\em Constant} \\ att_1~\theta~att_2 \\ att_k~\theta~att_{fk} \\ att(rel_1)~\theta~att(rel_2) \end{array} \right\} \times \left\{ \begin{array}{c} \mbox{\underline{comparison operator, $\theta$}} \\ \mbox{only equality } (=) \\ \mbox{inequality } (<>) \end{array} \right\} \] \end{center} \caption{Non-temporal Selection-based Taxonomy} \label{fig:sel1} \end{figure} \subsubsection{Additional Contributions---TEMPORARY} The distinction between grouped and ungrouped queries has not been integrated into the taxonomy. To do that, definitions of these categories are needed. \subsection{Overview and Naming of Categories} Each query has a single output component, zero or more valid-time selection components (one per such operator), and zero or more non-temporal selection-based components (one per such operator). The taxonomy is summarized in Figure~\ref{fig:tax}. There, the names introduced in the taxonomy are used along with punctuation in order to name a category. \begin{figure}[htbp] \begin{BNF} \nt{non-t selection} \=\is\ \= \kill \nt{category} \> \is \> \nt{output} \qt{/} \lbr \nt{v-t selection} \starbr \qt{/} \lbr \nt{non-t selection} \starbr \\ \nt{output} \> \is \> \qt{(} \= \lbr None \vbar Projected \vbar Complete \rbr \qt{,} \` \com{explicit-attribute component} \\ \>\>\> \lbr \= None \vbar \` \com{no valid-time attribute} \\ \>\>\>\> \lbr Event \vbar Interval \vbar Element \rbr \qt{,} \` \com{type of valid-time attribute} \\ \>\>\>\> \lbr Derived \vbar Imposed \rbr \rbr \qt{)} \` \com{value of valid-time attribute} \\ \nt{v-t selection} \> \is \> \qt{(} \lbr Duration \vbar Ordering \vbar Containment \rbr \qt{,} \` \com{operator type}\\ \>\>\> \lbr Event \vbar Interval \vbar Element \rbr \qt{,} \` \com{argument type} \\ \>\>\> \lbr Explicit \vbar User-defined \vbar Computed \rbr \qt{)} \` \com{argument origin} \\ \nt{non-t selection} \> \is \> \qt{(} \lbr \qt{=} \vbar \qt{$<>$} \rbr \qt{,} \` \com{operator type} \\ \>\>\> \lbr Constant \vbar Single \vbar Foreign \vbar Arbitrary \rbr \qt{)} \` \com{argument types} \\ \end{BNF} \caption{Overview of the Taxonomy used for Naming Categories} \label{fig:tax} \end{figure} To exemplify the use of Figure~\ref{fig:tax} for naming categories, consider the query ``When was Ed Manager of the Toy Department.'' This query is in the category shown next (with no valid-time selection). \begin{center} (None, Element, Derived) // (=, Constant) \end{center} It may be observed that the taxonomy gives rise to a large number of categories. For example, assuming a single non-temporal operator and no valid-time operators, there are $20 \times 8 = 160$ categories. Adding a single valid-time operator while assuming orthogonality yields an additional 4320 categories! As a result, it becomes necessary to create classes of categories which then may be used for clasifying the benchmark queries. One approach would be to name a {\em class} of categories of queries, by simply replacing one or more of the entries with the Kleene star (``*''), e.g., \begin{center} (None, Element, Derived) / (*,*,*) / (=, Constant) \end{center} The above query category would be in this class. In the next section, we define the classes to be used in the benchmark. \subsection{Forming Classes from Categories} The idea is to remove distinctions from the comprehensive taxonomy until a suitable number of classes is obtained. Figure~\ref{fig:tax2} is thus a reduced version of Figure~\ref{fig:tax}. \begin{figure}[htbp] \begin{BNF} \nt{reduced v-t selection} \=\is\ \= \kill \nt{class-name} \> \is \> \nt{reduced output} \qt{/} \lbr \nt{reduced v-t selection} \starbr \\ \nt{reduced output} \> \is \> \qt{(} \= \lbr None \vbar Proj/Comp \rbr \qt{,} \` \com{explicit-attribute component} \\ \>\>\> \lbr None \vbar Not empty \rbr \qt{)} \qt{/} \` \com{valid-time attribute component} \\ \nt{reduced v-t selection} \> \is \> \qt{(} \> \lbr Duration \vbar Other \rbr \qt{,} \` \com{comparison operator type} \\ \>\>\> \lbr Event \vbar Interval \vbar Element \rbr \qt{,} \` \com{argument type} \\ \>\>\> \lbr Computed \vbar Other \rbr \qt{)} \` \com{argument origin} \end{BNF} \caption{Overview of the Classification of Queries} \label{fig:tax2} \end{figure} The second and third lines concern output. Only the prescence or absence of explicit attributes and timestamps are distinguished, leading to three categories. The last three lines concern valid-time selection (non-temporal selection is disregarded). Comparison operators may be duration-based or not; arguments be of either event, interval, or element type; and the arguments may or may not derive from valid times of tuples. \end{document}